Module 1
Introduction to Databases
Databases and database technology play a major role in modern computing. Almost every field that uses computers depends on databases to store, manage, and retrieve data efficiently.
Databases are used in:
- Business and Banking systems
- Electronic Commerce (Amazon, Flipkart, etc.)
- Engineering and Scientific research
- Medicine and Healthcare systems
- Education and University systems
- Law, Government, and Library science
Without databases, managing large volumes of data would be slow, error-prone, and inefficient.
Database
Definition
A database is a collection of related data. By data, we mean known facts that can be recorded and have an implicit meaning.
Example of data:
- Names of people
- Phone numbers
- Addresses
- Student marks
These facts become useful information when they are stored in an organized manner.
(Simplified Database System)
Implicit Properties of a Database
A database has several important implicit properties:
1. Representation of the Real World
A database represents some aspect of the real world known as:
- Miniworld
- Universe of Discourse (UoD)
Any change in the real world (miniworld) is reflected in the database.
Example: If a student changes address, the database record is updated.
2. Logical Coherence
A database is a logically coherent collection of data. Random or unrelated data cannot form a database.
3. Designed for a Specific Purpose
Every database is:
- Designed
- Built
- Populated
for a specific group of users and applications.
Database Summary
A database can be summarized as:
- It has a source (miniworld)
- It interacts with real-world events
- It has an audience (users and applications)
Databases evolve over time as the real world and requirements change.
Size and Complexity of a Database
Databases can vary greatly in size and complexity.
Small Databases
- Few hundred records
- Simple structure
- Example: Personal contact list
Large Databases
- Millions of records
- Highly complex structure
- Example: Amazon database
Amazon database stores data for millions of products such as books, electronics, clothing, etc.
Manual vs Computerized Database
Manual Database
- Maintained using paper records
- Time-consuming and error-prone
- Example: Telephone directory
Computerized Database
- Maintained using computers
- Fast data access and update
- Can be managed by programs or DBMS
Large and complex databases always require a Database Management System.
Database Management System (DBMS)
Definition
A Database Management System (DBMS) is a collection of programs that enables users to create, maintain, and access a database.
DBMS acts as an interface between:
- The database
- The users
- Application programs
Functions of DBMS
1. Defining the Database
DBMS allows defining:
- Data types
- Data structures
- Constraints
This information is stored in a database catalog / data dictionary, also known as metadata.
2. Constructing the Database
Data is stored on a physical storage device controlled by the DBMS.
3. Manipulating the Database
- Retrieving data (queries)
- Updating data
- Generating reports
4. Sharing the Database
Multiple users and applications can access the database simultaneously.
Additional DBMS Functions
Protection
- Protection from hardware failures
- Protection from software crashes
- Security against unauthorized access
Maintenance
Databases are long-term systems. DBMS supports:
- Schema changes
- Requirement evolution
- Data growth
A database + DBMS together form a database system.
Example: UNIVERSITY Database
Consider a UNIVERSITY database used to store information about students, courses, and grades.
Files in the Database
- STUDENT - stores student details
- COURSE - stores course information
- SECTION - stores course section details
- GRADE_REPORT - stores student grades
- PREREQUISITE - stores course prerequisites
Each file contains records of the same type and is logically related to others.
| Course_name | Course_number | Credit_hours | Department |
|---|---|---|---|
| Intro to Computer Science | CS1310 | 4 | CS |
| Data Structures | CS3320 | 4 | CS |
| Discrete Mathematics | MATH2410 | 3 | MATH |
| Database | BCS403 | 3 | CS |
| Section_identifier | Course_number | Semester | Year | Instructor |
|---|---|---|---|---|
| 92 | BCS403 | Fall | 04 | Pruthviraj |
| 85 | MATH2410 | Fall | 04 | Rajshekhar Hammigi |
| 102 | CS3320 | Spring | 05 | G C DIVYA |
| 112 | MATH2410 | Fall | 05 | Rajshekhar Hammigi |
| 119 | CS1310 | Fall | 05 | SAMEER B |
| 135 | CS3380 | Fall | 05 | MANJUNATH K G |
| Student_number | Section_identifier | Grade |
|---|---|---|
| 17 | 112 | B |
| 17 | 119 | C |
| 8 | 85 | A |
| 8 | 92 | A |
| 8 | 102 | B |
| 8 | 135 | A |
| Course_number | Prerequisite_number |
|---|---|
| CS3380 | CS3320 |
| CS3380 | MATH2410 |
| CS3320 | CS1310 |
Defining a UNIVERSITY Database
Defining a database means specifying the structure of records stored in each file and the data elements contained in those records.
Record Structure Specification
Each file in the UNIVERSITY database contains records of a specific type. The attributes (fields) of each record must be clearly defined.
- STUDENT: Name, Student_number, Class, Major
- COURSE: Course_name, Course_number, Credit_hours, Department
- SECTION: Section_identifier, Course_number, Semester, Year, Instructor
- GRADE_REPORT: Student_number, Section_identifier, Grade
- PREREQUISITE: Course_number, Prerequisite_number
Data Type Specification
Each data element must have an appropriate data type.
- Name → string of alphabetic characters
- Student_number → integer
- Course_number → string
- Credit_hours → integer
- Grade → character
Constructing the UNIVERSITY Database
Constructing the database involves storing actual data as records in the appropriate files defined earlier.
Data Storage
- Each student is stored as a record in the STUDENT file
- Each course is stored as a record in the COURSE file
- Each section is stored as a record in the SECTION file
- Each grade entry is stored in the GRADE_REPORT file
- Each prerequisite relationship is stored in the PREREQUISITE file
Relationships Between Records
Records in different files are logically related.
- A student may have multiple grade records
- A course may have multiple sections
- A course may have multiple prerequisites
These relationships allow meaningful retrieval of information from the database.
Manipulating the UNIVERSITY Database
Database manipulation involves querying and updating the database.
Query Operations
Examples of database queries include:
- Retrieve the transcript of a student
- List all courses taken by a student in a particular year
- Find grades obtained by a student in each section
- List the prerequisites of a given course
Update Operations
Examples of updates include:
- Insert a new student record
- Update a student’s grade
- Add a new course or section
- Delete obsolete records
Queries and updates must be specified precisely using the DBMS query language (such as SQL).
Database Design Process
Designing a database follows a structured process to ensure accuracy, flexibility, and efficiency.
1. Requirements Specification & Analysis
- Identify user needs
- Understand data requirements
- Document requirements in detail
2. Conceptual Design
Represents data at a high level using models such as the ER model.
3. Logical Design
Converts conceptual design into a data model supported by a DBMS (e.g., relational model).
4. Physical Design
- Defines storage structures
- Specifies access paths
- Optimizes performance
The database is then implemented, populated, and continuously maintained.
Database Approach vs File Processing Approach
In traditional file processing, each department maintains its own files and applications.
Problems with File Processing Approach
- Data redundancy
- Data inconsistency
- Wasted storage space
- Difficult maintenance
Database Approach
- Single centralized database
- Data defined once and shared
- Reduced redundancy
- Improved data consistency
A DBMS ensures controlled access to shared data by multiple users.
Characteristics of the Database Approach
- Self-describing nature of a database system
- Program-data independence
- Data abstraction
- Support for multiple views
- Multi-user transaction processing
Self-Describing Nature of a Database System
A database system stores not only the data but also the description of the database structure.
This description is called metadata and is stored in the system catalog.
- Structure of files
- Data types of fields
- Constraints on data
The system catalog is used by both users and the DBMS software to correctly interpret stored data.
Program - Data Independence and Data Abstraction
Program - Data Independence
In DBMS, data structure definitions are separated from application programs.
Changes in data structure do not require modification of application programs.
Program - Operation Independence
Operations are defined separately from their implementation. Applications use operation names without knowing how they are implemented.
Data Abstraction
Data abstraction hides storage and implementation details from users.
A data model provides a conceptual view using objects, attributes, and relationships.
Support of Multiple Views of the Data
A database typically has many users, and each user may require a different view of the database.
A view is a subset of the database or a virtual representation derived from the database that is not explicitly stored.
Need for Multiple Views
- Different users have different data requirements
- Users should see only relevant data
- Security and simplicity are improved
Example
In a UNIVERSITY database, one user may be interested only in viewing and printing student transcripts.
This user's view contains only student details, courses, and grades, even though the database stores much more information.
A multiuser DBMS must provide facilities to define and manage such views.
Sharing of Data and Multiuser Transaction Processing
A multiuser DBMS allows multiple users to access the database simultaneously.
This is essential when data for multiple applications is integrated and maintained in a single database.
Concurrency Control
The DBMS must include concurrency control mechanisms to ensure correct execution when multiple users update the same data.
Example
In an airline reservation system, multiple agents may try to assign the same seat.
The DBMS ensures that each seat is assigned to only one passenger.
OLTP Applications
Applications that involve frequent, short database transactions are called Online Transaction Processing (OLTP) applications.
A major role of a multiuser DBMS is to ensure that concurrent transactions execute correctly and efficiently.
Transaction Concept and Properties
A transaction is an executing program or process that includes one or more database operations such as reading or updating data.
Transaction Properties
Atomicity
Ensures that all operations of a transaction are executed completely or none are executed.
Isolation
Ensures that each transaction appears to execute independently of other concurrent transactions.
Even when many transactions execute simultaneously, their effects do not interfere with each other.
Database Users
Database users can be broadly classified into different categories based on their interaction with the database system.
Main Categories
- Users who use and control database content
- Users who design, develop, and maintain database applications
- Users who design and maintain DBMS software and systems
Actors on the Scene
1. Database Administrator (DBA)
The DBA is the chief administrator responsible for managing the database system.
- Authorizes user access
- Monitors database usage
- Handles security and performance issues
- Plans hardware and software upgrades
2. Database Designers
Responsible for identifying data to be stored and organizing it efficiently.
- Interact with users
- Develop user-specific views
- Ensure database supports all user requirements
3. End Users
End users access the database for querying, updating, and reporting.
Types of End Users
Casual End Users
Use the database occasionally and require different information each time.
Naive / Parametric End Users
Use predefined transactions frequently.
- Bank tellers
- Reservation clerks
Sophisticated End Users
Use DBMS facilities to develop their own applications.
Stand-Alone Users
Maintain personal databases using ready-made software packages.
System Analysts and Application Programmers
System Analysts
Analyze user requirements and design specifications for database applications.
Application Programmers
Implement, test, document, and maintain application programs.
Workers Behind the Scene
1. DBMS System Designers and Implementers
Design and implement DBMS software components.
- Catalog management
- Query processing
- Concurrency control
- Recovery and security
2. Tool Developers
Develop tools for database design, modeling, and performance tuning.
3. Operators and Maintenance Personnel
Responsible for running and maintaining the hardware and software environment.